Goto

Collaborating Authors

 Bartholomew County



"A Big Bold Beautiful Journey" Is None of Those Things

The New Yorker

"A Big Bold Beautiful Journey" Is None of Those Things Kogonada's fantasy film, starring Colin Farrell and Margot Robbie, suggests that a great directorial talent is losing his way. In Kogonada's new film, Colin Farrell and Margot Robbie try gamely to overcome the thinness with which their characters have been imagined. If movies were given scores as figure skaters are, fantasy would start with a high rating for technical difficulty. The landings of the genre are hard to stick, because fantasy, by definition, isn't rooted in experience. No one has lived on a distant planet, in the far future, or any place where dragons or wizards rule--so, kudos to anyone who can make such realms feel truly lived in.


Evaluation of state-of-the-art deep learning models in the segmentation of the heart ventricles in parasternal short-axis echocardiograms

arXiv.org Artificial Intelligence

Previous studies on echocardiogram segmentation are focused on the left ventricle in parasternal long-axis views. In this study, deep-learning models were evaluated on the segmentation of the ventricles in parasternal short-axis echocardiograms (PSAX-echo). Segmentation of the ventricles in complementary echocardiogram views will allow the computation of important metrics with the potential to aid in diagnosing cardio-pulmonary diseases and other cardiomyopathies. Evaluating state-of-the-art models with small datasets can reveal if they improve performance on limited data. PSAX-echo were performed on 33 volunteer women. An experienced cardiologist identified end-diastole and end-systole frames from 387 scans, and expert observers manually traced the contours of the cardiac structures. Traced frames were pre-processed and used to create labels to train 2 specific-domain (Unet-Resnet101 and Unet-ResNet50), and 4 general-domain (3 Segment Anything (SAM) variants, and the Detectron2) deep-learning models. The performance of the models was evaluated using the Dice similarity coefficient (DSC), Hausdorff distance (HD), and difference in cross-sectional area (DCSA). The Unet-Resnet101 model provided superior performance in the segmentation of the ventricles with 0.83, 4.93 pixels, and 106 pixel2 on average for DSC, HD, and DCSA respectively. A fine-tuned MedSAM model provided a performance of 0.82, 6.66 pixels, and 1252 pixel2, while the Detectron2 model provided 0.78, 2.12 pixels, and 116 pixel2 for the same metrics respectively. Deep-learning models are suitable for the segmentation of the left and right ventricles in PSAX-echo. This study demonstrated that specific-domain trained models such as Unet-ResNet provide higher accuracy for echo segmentation than general-domain segmentation models when working with small and locally acquired datasets.


Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering

arXiv.org Artificial Intelligence

Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning, over data represented in tabular form. Previous approaches demonstrated notable performance by leveraging either closed-source large language models (LLMs) or fine-tuned open-weight LLMs. However, fine-tuning LLMs requires high-quality training data, which is costly to obtain, and utilizing closed-source LLMs poses accessibility challenges and leads to reproducibility issues. In this paper, we propose Multi-Agent Collaboration with Tool use (MACT), a framework that requires neither closed-source models nor fine-tuning. In MACT, a planning agent and a coding agent that also make use of tools collaborate to answer questions. Our experiments on four TQA benchmarks show that MACT outperforms previous SoTA systems on three out of four benchmarks and that it performs comparably to the larger and more expensive closed-source model GPT-4 on two benchmarks, even when using only open-weight models without any fine-tuning. We conduct extensive analyses to prove the effectiveness of MACT's multi-agent collaboration in TQA.


What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

arXiv.org Artificial Intelligence

What makes a difference in the post-training of LLMs? We investigate the training patterns of different layers in large language models (LLMs), through the lens of gradient, when training with different responses and initial models. We are specifically interested in how fast vs. slow thinking affects the layer-wise gradients, given the recent popularity of training LLMs on reasoning paths such as chain-of-thoughts (CoT) and process rewards. In our study, fast thinking without CoT leads to larger gradients and larger differences of gradients across layers than slow thinking (Detailed CoT), indicating the learning stability brought by the latter. Moreover, pre-trained LLMs are less affected by the instability of fast thinking than instruction-tuned LLMs. Additionally, we study whether the gradient patterns can reflect the correctness of responses when training different LLMs using slow vs. fast thinking paths. The results show that the gradients of slow thinking can distinguish correct and irrelevant reasoning paths. As a comparison, we conduct similar gradient analyses on non-reasoning knowledge learning tasks, on which, however, trivially increasing the response length does not lead to similar behaviors of slow thinking. Our study strengthens fundamental understandings of LLM training and sheds novel insights on its efficiency and stability, which pave the way towards building a generalizable System-2 agent. Our code, data, and gradient statistics can be found in: https://github.com/MingLiiii/Layer_Gradient.


Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

arXiv.org Artificial Intelligence

With the broader usage and highly successful development of Large Language Models (LLMs), there has been a growth of interest and demand for applying LLMs to autonomous driving technology. Driven by their natural language understanding and reasoning ability, LLMs have the potential to enhance various aspects of autonomous driving systems, from perception and scene understanding to language interaction and decision-making. In this paper, we first introduce novel concepts and approaches to designing LLMs for autonomous driving (LLM4AD). Then, we propose a comprehensive benchmark for evaluating the instruction-following abilities of LLMs within the autonomous driving domain. Furthermore, we conduct a series of experiments on both simulation and real-world vehicle platforms, thoroughly evaluating the performance and potential of our LLM4AD systems. Our research highlights the significant potential of LLMs to enhance various aspects of autonomous vehicle technology, from perception and scene understanding to language interaction and decision-making.


Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning

arXiv.org Artificial Intelligence

The process of instruction tuning aligns pre-trained large language models (LLMs) with open-domain instructions and human-preferred responses. While several studies have explored autonomous approaches to distilling and annotating instructions from more powerful proprietary LLMs, such as ChatGPT, they often neglect the impact of task distributions and the varying difficulty of instructions of the training sets. This oversight can lead to imbalanced knowledge capabilities and poor generalization powers of small student LLMs. To address this challenge, we introduce Task-Aware Curriculum Planning for Instruction Refinement (TAPIR), a multi-round distillation framework with balanced task distributions and dynamic difficulty adjustment. This approach utilizes an oracle LLM to select instructions that are difficult for a student LLM to follow and distill instructions with balanced task distributions. By incorporating curriculum planning, our approach systematically escalates the difficulty levels, progressively enhancing the student LLM's capabilities. We rigorously evaluate TAPIR using two widely recognized benchmarks, including AlpacaEval 2.0 and MT-Bench. The empirical results demonstrate that the student LLMs, trained with our method and less training data, outperform larger instruction-tuned models and strong distillation baselines. The improvement is particularly notable in complex tasks, such as logical reasoning and code generation.


Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges

arXiv.org Artificial Intelligence

Machine learning (ML) is widely used for key tasks in Connected and Automated Vehicles (CAV), including perception, planning, and control. However, its reliance on vehicular data for model training presents significant challenges related to in-vehicle user privacy and communication overhead generated by massive data volumes. Federated learning (FL) is a decentralized ML approach that enables multiple vehicles to collaboratively develop models, broadening learning from various driving environments, enhancing overall performance, and simultaneously securing local vehicle data privacy and security. This survey paper presents a review of the advancements made in the application of FL for CAV (FL4CAV). First, centralized and decentralized frameworks of FL are analyzed, highlighting their key characteristics and methodologies. Second, diverse data sources, models, and data security techniques relevant to FL in CAVs are reviewed, emphasizing their significance in ensuring privacy and confidentiality. Third, specific applications of FL are explored, providing insight into the base models and datasets employed for each application. Finally, existing challenges for FL4CAV are listed and potential directions for future investigation to further enhance the effectiveness and efficiency of FL in the context of CAV are discussed.


Pre-Deployment Testing of Low Speed, Urban Road Autonomous Driving in a Simulated Environment

arXiv.org Artificial Intelligence

Low speed autonomous shuttles emulating SAE Level L4 automated driving using human driver assisted autonomy have been operating in geo-fenced areas in several cities in the US and the rest of the world. These autonomous vehicles (AV) are operated by small to mid-sized technology companies that do not have the resources of automotive OEMs for carrying out exhaustive, comprehensive testing of their AV technology solutions before public road deployment. Due to the low speed of operation and hence not operating on roads containing highways, the base vehicles of these AV shuttles are not required to go through rigorous certification tests. The way the driver assisted AV technology is tested and allowed for public road deployment is continuously evolving but is not standardized and shows differences between the different states where these vehicles operate. Currently, AVs and AV shuttles deployed on public roads are using these deployments for testing and improving their technology. However, this is not the right approach. Safe and extensive testing in a lab and controlled test environment including Model-in-the-Loop (MiL), Hardware-in-the-Loop (HiL) and Autonomous-Vehicle-in-the-Loop (AViL) testing should be the prerequisite to such public road deployments. This paper presents three dimensional virtual modeling of an AV shuttle deployment site and simulation testing in this virtual environment. We have two deployment sites in Columbus of these AV shuttles through the Department of Transportation funded Smart City Challenge project named Smart Columbus. The Linden residential area AV shuttle deployment site of Smart Columbus is used as the specific example for illustrating the AV testing method proposed in this paper.


Judge Me in Context: A Telematics-Based Driving Risk Prediction Framework in Presence of Weak Risk Labels

arXiv.org Artificial Intelligence

Driving risk prediction has been a topic of much research over the past few decades to minimize driving risk and increase safety. The use of demographic information in risk prediction is a traditional solution with applications in insurance planning, however, it is difficult to capture true driving behavior via such coarse-grained factors. Therefor, the use of telematics data has gained a widespread popularity over the past decade. While most of the existing studies leverage demographic information in addition to telematics data, our objective is to maximize the use of telematics as well as contextual information (e.g., road-type) to build a risk prediction framework with real-world applications. We contextualize telematics data in a variety of forms, and then use it to develop a risk classifier, assuming that there are some weak risk labels available (e.g., past traffic citation records). Before building a risk classifier though, we employ a novel data-driven process to augment weak risk labels. Extensive analysis and results based on real-world data from multiple major cities in the United States demonstrate usefulness of the proposed framework.